Confident Learning
This page contains my reading notes on
Notations:
All symbols with * are related to the unknown, true labels.
All symbols with \sim are related to the given, noisy labels.
All symbols with ^ are related to the estimates (the given model).
The procedure needs 2 inputs:
Out-of-sample predicted probabilities \hat{\mathbf{P}}: a matrix of n rows (# of training instances) and m columns (labels).
CL requires users to train a model on the training set using cross validation.
The model must be able to provide probability outputs to all possible labels.
The given labels \tilde{\mathbf{y}}: a vector of length n (# of training instances).
Five Methods to identify instances with noisy labels
1. CL baseline 1: C_{confusion}
The instance is considered to have the noisy label if its given label is different from the label with largest predicted probability.
2. CL method 2: C_{\tilde{y}, y^{*}}
In this method, a matrix called confident joint C_{\tilde{y}, y^{*}} will be calculated using \hat{\mathbf{P}} and \tilde{\mathbf{y}}.
C_{\tilde{y}, y^{*}} | y^{*} = 0 | y^{*} = 1 | y^{*} = 2 |
---|---|---|---|
\tilde{y} = 0 | 100 | 40 | 20 |
\tilde{y} = 1 | 56 | 60 | 0 |
\tilde{y} = 2 | 32 | 12 | 80 |
To calculate this matrix:
For each label j, calculate the average predicted probability t_{j} using \hat{\mathbf{P}}.
For each instance \mathbf{x}_{k} with the given label i in the training set, the entry at row i and column j of the confident joint matrix C_{\tilde{y}=i, y^{*}=j} will be added 1, where the true label j is the one that has the largest predicted probability among all the labels whose predicted probabilities are above the respected t_{j}.
This basically means that the true label for a given instance is the label whose predicted probability by a model is larger than the average predicted probability.
If there are more than one such labels, chose the one that has the largest predicted probability.
It is possible that no such label exists, and thus the instance won’t be counted in the matrix.
Thus, each entry in C_{\tilde{y}, y^{*}} is corresponding to a set of training instances.
All instances that fall in the off-diagonal of the C_{\tilde{y}, y^{*}} are considered to have noisy labels.
3. CL method 3: Prune by Class (PBC)
In this method and all methods below, another matrix called Estimate of joint \hat{Q}_{\tilde{y}, y^{*}} will be calculated using C_{\tilde{y}, y^{*}}.
\hat{Q}_{\tilde{y}, y^{*}} | y^{*} = 0 | y^{*} = 1 | y^{*} = 2 |
---|---|---|---|
\tilde{y} = 0 | 0.25 | 0.1 | 0.05 |
\tilde{y} = 1 | 0.14 | 0.15 | 0 |
\tilde{y} = 2 | 0.08 | 0.03 | 0.2 |
\hat{Q}_{\tilde{y}, y^{*}} basically is the normlized C_{\tilde{y}, y^{*}}: each entry in C_{\tilde{y}, y^{*}} is divided by the total number of training instances.
For each class i, the a number of instances with lowest predicted probabilities for label i are considered to have noisy labels, where a is calculated as the product of n and the sum of off-diagonal entries on row i of \hat{Q}_{\tilde{y}, y^{*}}.
4. CL method 4: Prune by Noise Rate (PBNR)
For each off-diagonal entry in \hat{Q}_{\tilde{y}, y^{*}}, the n \times \hat{Q}_{\tilde{y}=i, y^{*}=j} number of instances with largest margin are considered to have noisy labels, where the margin of an instance \mathbf{x}_{k} with respect to given label i and true label j is \hat{\mathbf{P}}_{k, j} - \hat{\mathbf{P}}_{k, i}.
5. CL method 5: C + NR
The instance is considered to have a noisy label if both PBC and PBNR consider it to have a noisy label.